A Closer Look at Spatiotemporal Convolutions for Action Recognition

نویسندگان

  • Du Tran
  • Heng Wang
  • Lorenzo Torresani
  • Jamie Ray
  • Yann LeCun
  • Manohar Paluri
چکیده

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning. Furthermore, we show that factorizing the 3D convolutional filters into separate spatial and temporal components yields significantly advantages in accuracy. Our empirical study leads to the design of a new spatiotemporal convolutional block “R(2+1)D” which gives rise to CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101 and HMDB51.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rethinking Spatiotemporal Feature Learning For Video Understanding

In this paper we study 3D convolutional networks for video understanding tasks. Our starting point is the stateof-the-art I3D model of [3], which “inflates” all the 2D filters of the Inception architecture to 3D. We first consider “deflating” the I3D model at various levels to understand the role of 3D convolutions. Interestingly, we found that 3D convolutions at the top layers of the network c...

متن کامل

I'm No Longer a Child: A Closer Look at the Interaction Between Iranian EFL University Students' Identities and Their Academic Performance

Although university EFL students represent a wide array of social and cultural identities, their multiple and diverse identities are not usually considered in foreign language classrooms. This qualitative case study attempted to examine identity conflicts experienced by Iranian EFL learners at the university context. To this end, two Shiraz University students' identities were investigated. Sem...

متن کامل

A Closer Look to the Most Frequent Travelers’ Disease: A Systematic Update on Travelers’ Diarrhea

The present study, wants to highlight and review the most prevalent disease amongst travelers. In the current review, an updated review regarding epidemiology, involved pathogens, and a brief review of current evidence-based guidelines for prevention and treatment of this disease are provided. A distinguishing feature of the current review is the discussion of the impacts of irritable bowel syn...

متن کامل

A closer look at rock physics models and their assisted interpretation in seismic exploration

Subsurface rocks and their fluid content along with their architecture affect reflected seismic waves through variations in their travel time, reflection amplitude, and phase within the field of exploration seismology. The combined effects of these factors make subsurface interpretation by using reflection waves very difficult. Therefore, assistance from other subsurface disciplines is needed i...

متن کامل

Appearance-and-Relation Networks for Video Classification

Spatiotemporal feature learning in videos is a fundamental and difficult problem in computer vision. This paper presents a new architecture, termed as Appearanceand-Relation Network (ARTNet), to learn video representation in an end-to-end manner. ARTNets are constructed by stacking multiple generic building blocks, called as SMART, whose goal is to simultaneously model appearance and relation f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.11248  شماره 

صفحات  -

تاریخ انتشار 2017